AITopics | over-parameterized regime

On the Double Descent of Random Features Models Trained with SGD

Neural Information Processing SystemsApr-28-2026, 04:24:45 GMT

We study generalization properties of random features (RF) regression in high dimensions optimized by stochastic gradient descent (SGD) in under-/overparameterized regime. In this work, we derive precise non-asymptotic error bounds of RF regression under both constant and polynomial-decay step-size SGD setting, and observe the double descent phenomenon both theoretically and empirically. Our analysis shows how to cope with multiple randomness sources of initialization, label noise, and data sampling (as well as stochastic gradients) with no closedform solution, and also goes beyond the commonly-used Gaussian/spherical data assumption. Our theoretical results demonstrate that, with SGD training, RF regression still generalizes well for interpolation learning, and is able to characterize the double descent behavior by the unimodality of variance and monotonic decrease of bias. Besides, we also prove that the constant step-size SGD setting incurs no loss in convergence rate when compared to the exact minimum-norm interpolator, as a theoretical justification of using SGD in practice.

artificial intelligence, machine learning, neural information processing system, (15 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)

Add feedback

2f2cd5c753d3cee48e47dbb5bbaed331-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 08:12:27 GMT

artificial intelligence, gradient descent, machine learning, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)

Add feedback

2f2cd5c753d3cee48e47dbb5bbaed331-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 08:12:23 GMT

artificial intelligence, gradient descent, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)

Add feedback

e271e30de7a2e462ca1f85cefa816380-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 10:58:04 GMT

descent, neural information processing system, neural network, (13 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization

Neural Information Processing SystemsFeb-8-2026, 02:35:31 GMT

In fact, a good damping parameter can be inexpensively estimated from the current iterate.

artificial intelligence, gradient descent, machine learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback

2f2cd5c753d3cee48e47dbb5bbaed331-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 02:35:28 GMT

gradient descent, ground truth, matrix, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.05)
North America > United States > Michigan (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Convergence beyond the over-parameterized regime using Rayleigh quotients

Neural Information Processing SystemsDec-24-2025, 03:08:54 GMT

In this paper, we present a new strategy to prove the convergence of Deep Learning architectures to a zero training (or even testing) loss by gradient flow. Our analysis is centered on the notion of Rayleigh quotients in order to prove Kurdyka-Lojasiewicz inequalities for a broader set of neural network architectures and loss functions. We show that Rayleigh quotients provide a unified view for several convergence analysis techniques in the literature. Our strategy produces a proof of convergence for various examples of parametric learning. In particular, our analysis does not require the number of parameters to tend to infinity, nor the number of samples to be finite, thus extending to test loss minimization and beyond the over-parameterized regime.

convergence, over-parameterized regime, rayleigh quotient, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.62)

Add feedback

Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization

Neural Information Processing SystemsDec-23-2025, 22:58:33 GMT

In practical instances of nonconvex matrix factorization, the rank of the true solution $r^{\star}$ is often unknown, so the rank $r$of the model can be over-specified as $r> r^{\star}$. This over-parameterized regime of matrix factorization significantly slows down the convergence of local search algorithms, from a linear rate with $r=r^{\star}$ to a sublinear rate when $r> r^{\star}$. We propose an inexpensive preconditioner for the matrix sensing variant of nonconvex matrix factorization that restores the convergence rate of gradient descent back to linear, even in the over-parameterized case, while also making it agnostic to possible ill-conditioning in the ground truth. Classical gradient descent in a neighborhood of the solution slows down due to the need for the model matrix factor to become singular. Our key result is that this singularity can be corrected by $\ell_{2}$ regularization with a specific range of values for the damping parameter. In fact, a good damping parameter can be inexpensively estimated from the current iterate. The resulting algorithm, which we call preconditioned gradient descent or PrecGD, is stable under noise, and converges linearly to an information theoretically optimal error bound. Our numerical experiments find that PrecGD works equally well in restoring the linear convergence of other variants of nonconvex matrix factorization in the over-parameterized regime.

name change, over-parameterized nonconvex matrix factorization, preconditioned gradient descent, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.59)

Add feedback

High-dimensional Analysis of Synthetic Data Selection

Rezaei, Parham, Kovacevic, Filip, Locatello, Francesco, Mondelli, Marco

arXiv.org Machine LearningOct-10-2025

Despite the progress in the development of generative models, their usefulness in creating synthetic data that improve prediction performance of classifiers has been put into question. Besides heuristic principles such as "synthetic data should be close to the real data distribution", it is actually not clear which specific properties affect the generalization error. Our paper addresses this question through the lens of high-dimensional regression. Theoretically, we show that, for linear models, the covariance shift between the target distribution and the distribution of the synthetic data affects the generalization error but, surprisingly, the mean shift does not. Furthermore we prove that, in some settings, matching the covariance of the target distribution is optimal. Remarkably, the theoretical insights from linear models carry over to deep neural networks and generative models. We empirically demonstrate that the covariance matching procedure (matching the covariance of the synthetic data with that of the data coming from the target distribution) performs well against several recent approaches for synthetic data selection, across training paradigms, architectures, datasets and generative models used for augmentation.

high probability, probability, synthetic data, (15 more...)

arXiv.org Machine Learning

2510.08123

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

On the Double Descent of Random Features Models Trained with SGD

Neural Information Processing SystemsAug-19-2025, 13:26:56 GMT

We study generalization properties of random features (RF) regression in high dimensions optimized by stochastic gradient descent (SGD) in under-/over-parameterized regime.

artificial intelligence, assumption, machine learning, (15 more...)

Neural Information Processing Systems

Country: